gh-139353: Add Objects/unicode_codecs_utf.c file by vstinner · Pull Request #142190 · python/cpython

vstinner · 2025-12-02T13:49:03Z

Rename functions:

ascii_decode() => _PyUnicode_DecodeASCII()
backslashreplace() => _PyUnicode_backslashreplace()
raise_encode_exception() => _PyUnicode_RaiseEncodeException()
unicode_decode_call_errorhandler_writer() => _PyUnicode_DecodeCallErrorHandler()
unicode_decode_utf8() => _PyUnicode_DecodeUTF8()
unicode_encode_call_errorhandler() => _PyUnicode_EncodeCallErrorHandler()
unicode_encode_utf8() => _PyUnicode_EncodeUTF8()
xmlcharrefreplace() => _PyUnicode_xmlcharrefreplace()

Move static inline functions and macros to pycore_unicodeobject.h:

_PyUnicode_CHECK()
_PyUnicode_UTF8()
PyUnicode_UTF8()
PyUnicode_SET_UTF8()
PyUnicode_UTF8_LENGTH()
PyUnicode_SET_UTF8_LENGTH()

Issue: Split large Objects/unicodeobject.c file into smaller files #139353

Rename functions: * ascii_decode() => _PyUnicode_DecodeASCII() * backslashreplace() => _PyUnicode_backslashreplace() * raise_encode_exception() => _PyUnicode_RaiseEncodeException() * unicode_decode_call_errorhandler_writer() => _PyUnicode_DecodeCallErrorHandler() * unicode_decode_utf8() => _PyUnicode_DecodeUTF8() * unicode_encode_call_errorhandler() => _PyUnicode_EncodeCallErrorHandler() * unicode_encode_utf8() => _PyUnicode_EncodeUTF8() * xmlcharrefreplace() => _PyUnicode_xmlcharrefreplace() Move static inline functions and macros to pycore_unicodeobject.h: * _PyUnicode_CHECK() * _PyUnicode_UTF8() * PyUnicode_UTF8() * PyUnicode_SET_UTF8() * PyUnicode_UTF8_LENGTH() * PyUnicode_SET_UTF8_LENGTH()

vstinner · 2025-12-03T15:39:59Z

@serhiy-storchaka: What do you think of this split?

serhiy-storchaka · 2025-12-03T16:14:22Z

I do not feel easy about this. The UTF codecs code is tightly coupled with other code. This PR makes some static function non-static, and exposes local functions in a header. This means that the compiler cannot completely inline them -- it needs to keep also a non-inlined copy, and this can affect its decision to inline them. This means that low level C API which was previously not intended to use outside of the unicodeobject.c file can now be used in other CPython code, at it will be used, for sure. This will also affect optimization and maintainability.

If the goal of this change is to improve maintainability, I am not sure that its effect on maintainability is net positive.

vstinner · 2025-12-03T16:19:54Z

An alternative is to put all codecs in a single file: #141469 (6,671 lines of C code).

vstinner requested review from a team, AA-Turner, emmatyping, ericsnowcurrently and erlend-aasland as code owners December 2, 2025 13:49

bedevere-app bot added the awaiting core review label Dec 2, 2025

bedevere-app bot mentioned this pull request Dec 2, 2025

Split large Objects/unicodeobject.c file into smaller files #139353

Open

vstinner added the skip news label Dec 2, 2025

Fix make check-c-globals

b76a704

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-139353: Add Objects/unicode_codecs_utf.c file#142190

gh-139353: Add Objects/unicode_codecs_utf.c file#142190
vstinner wants to merge 2 commits intopython:mainfrom
vstinner:unicode_codecs_utf

vstinner commented Dec 2, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

vstinner commented Dec 3, 2025

Uh oh!

serhiy-storchaka commented Dec 3, 2025

Uh oh!

vstinner commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vstinner commented Dec 2, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Dec 3, 2025

Uh oh!

serhiy-storchaka commented Dec 3, 2025

Uh oh!

vstinner commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vstinner commented Dec 2, 2025 •

edited by bedevere-app bot

Loading